WebVoyager Baseline Agent & Benchmark #282

alckasoc · 2025-01-17T04:42:57Z

🤔 Reasoning

Explain the purpose of this PR...

🚧 Changes

Describe the changes made...

✅ PR Checklist

codecov · 2025-01-17T04:57:59Z

Codecov Report

Attention: Patch coverage is 8.13810% with 745 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
...l/benchmarks/computer_use/webvoyager/webvoyager.py	0.00%	330 Missing ⚠️
...nchmarks/computer_use/webvoyager/utils_webarena.py	0.00%	187 Missing ⚠️
...benchmarks/computer_use/webvoyager/data_manager.py	0.00%	125 Missing ⚠️
...ential/benchmarks/computer_use/webvoyager/utils.py	0.00%	55 Missing ⚠️
...l/agents/computer_use/webvoyager_baseline/agent.py	0.00%	41 Missing ⚠️
...uter_use/webvoyager_baseline/strategies/general.py	89.39%	7 Missing ⚠️

❌ Your patch check has failed because the patch coverage (8.13%) is below the target coverage (95.00%). You can increase the patch coverage or adjust the target coverage.
❌ Your project check has failed because the head coverage (80.11%) is below the target coverage (95.00%). You can increase the head coverage or adjust the target coverage.

Files with missing lines	Coverage Δ
...ter_use/webvoyager_baseline/functional_webarena.py	`20.21% <100.00%> (+20.21%)`	⬆️
.../agents/computer_use/webvoyager_baseline/output.py	`100.00% <100.00%> (ø)`
agential/agents/expel/prompts.py	`100.00% <ø> (ø)`
...al/benchmarks/computer_use/osworld/data_manager.py	`96.19% <100.00%> (-0.96%)`	⬇️
...gential/benchmarks/computer_use/osworld/osworld.py	`21.66% <ø> (ø)`
...uter_use/webvoyager_baseline/strategies/general.py	`89.39% <89.39%> (ø)`
...l/agents/computer_use/webvoyager_baseline/agent.py	`0.00% <0.00%> (ø)`
...ential/benchmarks/computer_use/webvoyager/utils.py	`0.00% <0.00%> (ø)`
...benchmarks/computer_use/webvoyager/data_manager.py	`0.00% <0.00%> (ø)`
...nchmarks/computer_use/webvoyager/utils_webarena.py	`0.00% <0.00%> (ø)`
... and 1 more

... and 2 files with indirect coverage changes

…bvoy

alckasoc · 2025-02-02T01:15:41Z

agential/agents/computer_use/webvoyager_baseline/strategies/general.py

+        response = self.generate_thought(messages=messages, seed=seed)
+        prompt_tokens = response.prompt_tokens
+        completion_tokens = response.completion_tokens
+        gpt_4v_res = response.output_text


we arent using openai specifically. use the LLM class

alckasoc · 2025-02-02T01:16:25Z

agential/agents/computer_use/webvoyager_baseline/strategies/general.py

+            },
+        )
+
+    def reset(  ######## Fix documentation #############


are u absolutely sure there is no state within this agent?

alckasoc · 2025-02-02T01:17:16Z

agential/agents/computer_use/webvoyager_baseline/strategies/general.py

+        Returns:
+            Response: The generated output text from the model.
+        """
+        response = self.llm(messages, max_tokens=max_tokens, seed=seed, timeout=timeout)


structure this into _prompt_* functions

alckasoc · 2025-02-02T01:18:04Z

agential/agents/computer_use/webvoyager_baseline/strategies/general.py

+        self,
+        system_prompt: str,
+        system_prompt_text_only: str,
+        seed: int,
+        max_attached_imgs: int,
+        temperature: float,
+        text_only: bool,
+        task: Dict[str, Any],
+        obs: Dict[str, Any]


some of these are hyperparameters. some of them are parameters for the generate method. some of them shouldnt even be parameters

system prompts should never need to be passed in (refer to all the agents we've implemented thus far)

also doesn't this baseline webvoyager agent have a state? it keeps a history of all the messages. doesn't it? does your implementation consider that?

…bvoy

init

05173ee

alckasoc added enhancement New feature or request add-benchmark Adding support for a benchmark labels Jan 17, 2025

alckasoc assigned alckasoc and chuongnguyen26 Jan 17, 2025

alckasoc and others added 3 commits January 16, 2025 23:43

Merge branch 'main' into webvoy

4e88641

Merge branch 'main' into webvoy

b688565

add json examples

44533e8

alckasoc and others added 21 commits January 16, 2025 20:59

some files

0398c16

fix sorting error for get_task_ids_by_domain

38b9dbc

utils

66706f2

readme

c940f1a

.

50472b5

data manager

0901c92

data manager init

21c677f

ref answer getters

c17f086

getter

1dd32da

gaia data maanger

3936e71

auto lint

39dbd68

rename

02881aa

.

1448d05

base strategy for webvoy

975476b

Merge branch 'webvoy' of https://github.com/alckasoc/agential into we…

1528259

…bvoy

add selenium

07d32fa

webvoy general

bebead0

.

9d12c3f

Merge branch 'webvoy' of https://github.com/alckasoc/agential into we…

c22a610

…bvoy

some code

5900c53

finished general and output and finishing up agent

48ec17b

chuongnguyen26 and others added 24 commits January 18, 2025 02:39

reformatting

e6e1e9b

finishing up webvoyager integration

f8cc747

lotta changes

0e60364

fix base benchmark classes

c8fc7e2

lottac hanges

2e62441

ok

013ad35

.

d1492b8

.

68af977

add close

0863a93

clean up

049e61d

ok

cd2f10d

.

f64c7fe

some changes

95c610c

eval done

8196003

fix import

c6a8e7b

auto lint

8605017

.

7f042ae

experiment scripts

7564262

big changes

03dc444

finished agent

958190d

fix osworldnb

6ff93a6

Merge branch 'webvoy' of https://github.com/alckasoc/agential into we…

24e27da

…bvoy

add unittest

df6c60d

added test for general

3baa38a

alckasoc commented Feb 2, 2025

View reviewed changes

alckasoc added 5 commits February 2, 2025 03:22

fix

59849d0

Merge branch 'webvoy' of https://github.com/alckasoc/agential into we…

b4f787f

…bvoy

.

fda09fb

fix

b9c15f7

fix expel sh

a94ee8c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WebVoyager Baseline Agent & Benchmark #282

WebVoyager Baseline Agent & Benchmark #282

alckasoc commented Jan 17, 2025

codecov bot commented Jan 17, 2025 •

edited

Loading

alckasoc Feb 2, 2025

alckasoc Feb 2, 2025

alckasoc Feb 2, 2025

alckasoc Feb 2, 2025

alckasoc Feb 2, 2025

WebVoyager Baseline Agent & Benchmark #282

Are you sure you want to change the base?

WebVoyager Baseline Agent & Benchmark #282

Conversation

alckasoc commented Jan 17, 2025

🤔 Reasoning

🚧 Changes

✅ PR Checklist

codecov bot commented Jan 17, 2025 • edited Loading

Codecov Report

alckasoc Feb 2, 2025

Choose a reason for hiding this comment

alckasoc Feb 2, 2025

Choose a reason for hiding this comment

alckasoc Feb 2, 2025

Choose a reason for hiding this comment

alckasoc Feb 2, 2025

Choose a reason for hiding this comment

alckasoc Feb 2, 2025

Choose a reason for hiding this comment

codecov bot commented Jan 17, 2025 •

edited

Loading